Hamilton Jacobi Bellman (HJB) Equation

$ \newcommand{\pd}[2]{\frac{\partial#1}{\partial#2}} \newcommand{\B}[1]{\mathbf{#1}} \newcommand{\A}{\B{A}} \newcommand{\x}{\B{x}} \newcommand{\u}{\B{u}} $ Click to see latex command definitions.

From Bellman's principle of optimality, we can derive the continuous version of Bellman's equation, the Hamilton Jacobi Bellman equation. We start by defining the cost-to-go function $V(x(t), t)$.

$$ V(x(t), t) = \min_{u \in U} \left( \int_t^T C(x(t), u(t)) dt + D(x(T))\right) $$

with the constraint $ \frac{d x(t)}{dt} = F(x(t),u(t))$ and terminal condition $V(x(T), T) = D(x(T))$

Now we use Bellman's principle of optimality:

$$V(x(t), t ) = \min_{u \in U} \left( \int_t^{\Delta t} C(x(t), u(t)) dt + \min_{u \in U} \left( \int_{t + \Delta t}^T C(x(t), u(t)) dt + D(x(T))\right) \right) $$

This is a recursive function as before.

$$V(x(t), t) = \min_{u \in U} \left( \int_t^{\Delta t} C(x(t), u(t)) dt + V(x(t+\Delta t), t + \Delta t)\right) $$

Now, approximating the integral using a Taylor series expansion: $$V(x(t), t) = \min_{u \in U} \left( C(x(t), u(t)) \Delta t + V(x(t), t) + \frac{\partial V(x(t), t) }{\partial t} \Delta t + \frac{\partial V(x(t), t)}{\partial x} \frac{dx}{dt} \Delta t \right) $$

which when we substitute the dynamics constraint simplifies to the standard form of the Hamilton Jacobi Bellman equation:

$$-\frac{\partial V(x(t), t) }{\partial t} = \min_{u \in U} \left( C(x(t), u(t)) + \nabla V(x(t), t) \cdot F(x(t), u(t)) \right)$$

This can be solved backward in time using the terminal condition: $$V(x, T) = D(x)$$

Linear Quadratic Regulator (LQR) Derivation from HJB Equation

We define the optimal control problem by specifing the dynamics to be a linear system and the running cost to be quadratic.

$$ F(x,u) = A x + Bu $$$$ C(x,u) = x^T Q x + u^T R u $$

where $Q, R$ are positive definite semetric matrices.

Assume a solution of the form: $$ V(x) = x^T P x$$

where $P$ is a positive definite semetric matrix.

We now apply the Hamilton Jacobi Bellman equation:

$$ \nabla V(x) = x^T (P + P^T) = 2 x^T P $$$$-\pd{V(x)}{t} = \min_{u \in U} \left( x^T Q x + u^T R u + 2 x^T P(Ax + Bu) \right)$$

We solve for the minimum by taking the partial derivative with respect to $u$:

$$2 u^T R + 2 x^TPB = 0$$

Taking the transpose and simplifying: $$Ru + B^TPx = 0$$

Finally, we solve for $u$: $$u = -R^{-1}B^TPx$$

Note that the optimal policy resulting from the application of the Hamilton Jacobi Bellman equation for the LQR example ($u=-R^{-1}B^TPx$) is compactly expressed as a linear combination of the state x(t), and P(t). This is a result of the cost-to-go $V(x,t)$ being a quadratic function of the state.

We now solve for the evolution of P(t):

$$-x^T \dot{P} x - x^T A^TPx - x^TPAx = x^T Q x + x^T P B R^{-1} B^T P x - 2 x^T P A x - 2 x^T P B R^{-1} B^T P x$$

Note, that all of the terms are quadratic in $x$ and we can simplify this expression:

$$-\dot{P} = A^TP + PA - PBR^{-1}B^TP + Q$$

If we set $\dot{P} = 0$ we obtain the infinite horizon LQR solution by solving the continuous algebraic ricatti equation.

LQR Example


In [1]:
%pylab inline
import control
A = array([[1,2],[3,4]])
B = array([[1],[2]])
C = eye(2)
D = zeros((2,1))
ss = control.ss(A,B,C,D)
Q = eye(2)
R = eye(1)
P, L, G = control.care(A, B, Q, R)
P


Populating the interactive namespace from numpy and matplotlib
Out[1]:
array([[ 1.73598022,  0.49723745],
       [ 0.49723745,  1.86941526]])

In [2]:
K = linalg.inv(R).dot(B.T).dot(P)
ss_contr = control.ss(0,array([0,0]).T,0, K)
ss_contr


Out[2]:
A = [[ 0.]]

B = [[ 0.  0.]]

C = [[ 0.]]

D = [[ 2.73045511  4.23606798]]

In [3]:
ss_closed = ss.feedback(ss_contr)
ss_closed.pole()


Out[3]:
array([-5.8182729 , -0.38431817])

In [4]:
t, y, x = control.forced_response(ss_closed, X0 = 1.0, T=linspace(0,10))
plot(t, y.T)


Out[4]:
[<matplotlib.lines.Line2D at 0x3bc4c90>,
 <matplotlib.lines.Line2D at 0x3bc4f10>]